[URLSession]Fix for a timeout bug(SR-2681) #911

pushkarnk · 2017-03-07T12:20:24Z

Currently, we use the URLSessionConfiguration.timeoutIntervalForRequest value to trigger a timeout. https://bugs.swift.org/browse/SR-2681 reported a bug in which the timeout counter is not reset on every data transfer.

This is from the API doc:

The request timeout interval controls how long (in seconds) a task should wait for additional data to
 arrive before giving up. The timer associated with this value is reset whenever new data arrives. 
When the request timer reaches the specified interval without receiving any new data, it triggers a timeout.

This patch tries to reuse the _TimeoutSource class which is currently used to trigger timeouts in MultiHandle, with some modifications. We reset the timer every time data is available.

pushkarnk · 2017-03-07T12:26:02Z

With the current URLSession testing infrastructure, a test case for this, is a challenge.

pushkarnk · 2017-03-07T12:26:54Z

@parkera @naithar Can you please review and comment?

naithar · 2017-03-07T13:27:14Z

I have no comments on implementation, except for adding guard `self` = self else { return } inside DispatchWorkerItem closure. But i'm not really sure if it actually affects anything

As for testing, if I understand the problem correctly, TestURLSessionServer initializer could receive a delay interval for response, like init(port: UInt16, delay: TimeInterval? = nil) which could be used by .readAndRespond() method to delay actual response.
Here's the changes for TestURLSessionServer class and runServer(...) method that seem to correctly implement the timeout testing (I've changed nil default to 5 to test it): https://gist.github.com/naithar/7f26c24dd08c312858d13d4f5e97f728

naithar · 2017-03-07T14:23:55Z

Oh, now I see that i didn't understand the problem with tests correctly.

pushkarnk · 2017-03-07T14:47:51Z

Thanks @naithar I have added the guard

naithar · 2017-03-07T15:03:50Z

@pushkarnk
Again about testing. Seems like i've started digging in a right place actually.
If we can change a func writeData(data: String) method to split resulted string in chunks and send it with a delay this could emulate the request to http://httpbin.org/drip?duration=10&numbytes=10&code=200 which is used in bug report. But currently i managed only to send multiple data parameter in a single response.

var bytes = Array(data.utf8)


for _ in 0..<5 {
    sleep(1)
    _  = try? attempt("write", valid: isNotNegative, CInt(write(connectionSocket, &bytes, bytes.count)))
}

parkera · 2017-03-07T16:53:50Z

The diff looks reasonable but we should be able to get some kind of unit test in place for this.

pushkarnk · 2017-03-08T06:28:13Z

@naithar Thanks for your work. I will try out your suggestion. I was hoping that we can also delay the response as in:

    public func respond(with response: _HTTPResponse, delay: Int = 0) throws {
        let semaphore = DispatchSemaphore(value: 0)
        let deadlineTime = DispatchTime.now() + .seconds(delay)
        DispatchQueue.main.asyncAfter(deadline: deadlineTime) {
            do {
                try self.socket.writeData(data: response.description)
                semaphore.signal()
            } catch { }
        }
        semaphore.wait()
     }

I did this and I see a crash. Investigating it now.

naithar · 2017-03-09T06:27:34Z

Does crash message anything? Maybe "Trying to access a behaviour for a task that in not in the registry."? I can sometimes reproduce this by just delaying response for long enough.

This sometimes happens, because .taskCompleted state is set before .transferFailed, so task is no longer in registry when .completeTask(...) method is called.

naithar · 2017-03-09T07:53:07Z

Not sure what happened, but reapplying a PR to current master removed the crash.
This commit implements a chunked response, so this PR could be tested. It's ugly but it works.

pushkarnk · 2017-03-09T09:16:11Z

@naithar Yes, I did see that fatal error. Nevertheless, your patch looks good and the test runs well. I am wondering how we go about. I'd suggest we have this PR merged first and then you create a PR with your patch.

pushkarnk · 2017-03-09T09:17:31Z

I also realised that I need to cancel the timer when completeTask() is called.

naithar · 2017-03-09T09:20:07Z

@pushkarnk i can also create a PR with test commented out. Actually i'm not agains you taking this test implementation into this PR.

pushkarnk · 2017-03-09T09:24:22Z

You creating the PR with calls to the new tests commented out is a better idea, in my opinion :)

naithar · 2017-03-09T09:25:10Z

Then I'll do it now.

naithar · 2017-03-09T09:36:13Z

Submitted #913

pushkarnk · 2017-03-10T10:38:30Z

I can consistently see the "Trying to access a behaviour for a task that in not in the registry." failure with @naithar 's test case from #913 . Investigating. It is probably related to the previous test_cancelTask test and the fact that we use the same URLSession object.

naithar · 2017-03-10T10:55:24Z

That's strange.
I can see same failure. But moving test_taskTimeout test to the top of the tests array actually solves the issue.

I'll look into it a little more.

pushkarnk · 2017-03-10T10:58:18Z

Yes, looks like we need to cancel the timeoutTimer in complete(withError:) too.

naithar · 2017-03-10T10:59:15Z

Actually test_taskTimeout causes failure only if it follows test_cancelTask test. Atm I'm not sure what exactly causing the failure.

Also moving test_cancelTask to the top causes test_dataTaskWithURL fail for the same reason as timeout test - Trying to access a behaviour for a task that in not in the registry

naithar · 2017-03-10T11:37:42Z

Running Cancel test actually causes this error for TestURLSessionServer:

bind failed: Unknown error: -1 (-1)
  - operation : "bind"
  - errno : -1
  - file : "HTTPServer.swift"
  - line : 69

It's not displayed or handled because server is starting with this code (i guess the problem is in XCTAssertTrue(true)), so it won't show anything:

globalDispatchQueue.async {
            do {
                try self.runServer(with: serverReady)
            } catch {
                XCTAssertTrue(true)
                return
            }
        }

Also not signaling semaphore in catch in runServer(...) method causes tests to hang sometimes.

@pushkarnk I see several workarounds:

Make cancel test run last.
Disable server start for cancel test (I guess server just hangs and waits for request, which never happen in test_cancelTask)
Somehow change TestURLSessionServer or runServer(...) so it won't hang.

naithar · 2017-03-10T11:52:15Z

Switching

} catch let e as ServerError {
    if e.operation != "bind" { continue }

with

} catch let e as ServerError {
    if e.operation == "bind" { continue }

in runServer(...) method seems like a workaround and removes the error and crashes.

pushkarnk · 2017-03-10T12:16:09Z

Thanks @naithar That seems like a fix (not a workaround). We retry on a bind failure and throw an error otherwise.

@naithar

runServer must retry on a bind failure - thanks to Sergey (@naithar)

pushkarnk · 2017-03-10T12:51:07Z

@swift-ci please test

pushkarnk · 2017-03-10T13:38:33Z

@swift-ci please test and merge

pushkarnk force-pushed the timeout-bug branch 2 times, most recently from 1293512 to f04bfc1 Compare March 7, 2017 12:24

pushkarnk force-pushed the timeout-bug branch from f04bfc1 to d2970b4 Compare March 7, 2017 14:41

naithar mentioned this pull request Mar 9, 2017

[Test][NSURLSession] Chunked response #913

Merged

pushkarnk force-pushed the timeout-bug branch 2 times, most recently from 5b5bc08 to d4e8ecd Compare March 9, 2017 10:03

[URLSession]Fix for a timeout bug(SR-2681)

ab3f907

runServer must retry on a bind failure - thanks to Sergey (@naithar)

pushkarnk force-pushed the timeout-bug branch from d4e8ecd to ab3f907 Compare March 10, 2017 12:48

swift-ci merged commit 41542b2 into swiftlang:master Mar 10, 2017

pushkarnk mentioned this pull request Mar 14, 2017

Ensure curl headers are not overwritten but merged in special case. #915

Closed

pushkarnk deleted the timeout-bug branch May 4, 2017 09:51

pushkarnk mentioned this pull request Mar 14, 2017

[SR-2681] corelibs-foundation URLSession timeout value measures the wrong thing #4553

Closed

[URLSession]Fix for a timeout bug(SR-2681) #911

[URLSession]Fix for a timeout bug(SR-2681) #911

Uh oh!

Conversation

pushkarnk commented Mar 7, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pushkarnk commented Mar 7, 2017

Uh oh!

pushkarnk commented Mar 7, 2017

Uh oh!

naithar commented Mar 7, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

naithar commented Mar 7, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pushkarnk commented Mar 7, 2017

Uh oh!

naithar commented Mar 7, 2017

Uh oh!

parkera commented Mar 7, 2017

Uh oh!

pushkarnk commented Mar 8, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

naithar commented Mar 9, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

naithar commented Mar 9, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pushkarnk commented Mar 9, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pushkarnk commented Mar 9, 2017

Uh oh!

naithar commented Mar 9, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pushkarnk commented Mar 9, 2017

Uh oh!

naithar commented Mar 9, 2017

Uh oh!

naithar commented Mar 9, 2017

Uh oh!

pushkarnk commented Mar 10, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

naithar commented Mar 10, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pushkarnk commented Mar 10, 2017

Uh oh!

naithar commented Mar 10, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

naithar commented Mar 10, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

naithar commented Mar 10, 2017

Uh oh!

pushkarnk commented Mar 10, 2017

Uh oh!

pushkarnk commented Mar 10, 2017

Uh oh!

pushkarnk commented Mar 10, 2017

Uh oh!

Uh oh!

pushkarnk commented Mar 7, 2017 •

edited

Loading

naithar commented Mar 7, 2017 •

edited

Loading

naithar commented Mar 7, 2017 •

edited

Loading

pushkarnk commented Mar 8, 2017 •

edited

Loading

naithar commented Mar 9, 2017 •

edited

Loading

naithar commented Mar 9, 2017 •

edited

Loading

pushkarnk commented Mar 9, 2017 •

edited

Loading

naithar commented Mar 9, 2017 •

edited

Loading

pushkarnk commented Mar 10, 2017 •

edited

Loading

naithar commented Mar 10, 2017 •

edited

Loading

naithar commented Mar 10, 2017 •

edited

Loading

naithar commented Mar 10, 2017 •

edited

Loading